Machine learning: Data protection’s next frontier

November 1st, 2018

by Mark Cassetta

As you may have read, today we announced the introduction of TITUS Intelligent Protection, which will enable our customers to leverage machine learning to accelerate the adoption of a data protection strategy that works for them. While we’re excited about this, what I want to talk about is machine learning and the future of data protection.

TITUS was first to market in 2005 with a data classification solution that became widely adopted across global organizations. Since then, we’ve seen the market dramatically change as it went from early adopters through to validation with major platform players waking up to classification’s importance in the data protection equation. Being first to market means we’ve seen it all – the good of course, but more importantly, we’ve learned why enterprises struggle to adopt data classification and ultimately, data protection. Our job as an enterprise software company is to continue to push innovation so we can make that adoption experience as frictionless as possible. Machine learning will be key to doing that successfully.

It’s not about the user

When deploying data classification, organizations shift accountability/responsibility of your sensitive information to humans and away from security professionals. While we believe it is critical to creating a culture of security within an organization, we need to ensure we are assisting people as much as possible in this experience. The good news is that technology such as machine learning has become democratized and is now used pervasively to solve many problems.

Unsurprisingly, data classification leaders, including TITUS, have traditionally championed the user as the critical element in identifying and then applying proper classification to data they’ve created. After all, as creators, they, more than anyone else, know how valuable it is and the best security to apply. But that’s not always true. Users and human, and humans make mistakes. In fact, these mistakes speak to the essential challenge organizations face in deploying data classification and, more broadly, data protection solutions – confidence.

Deploying machine learning as part of your organization’s data protection strategy can provide the critical assistance users need to apply the proper safeguards to data they’ve created without adding friction to their day-to-day activities. In fact, as you start to inform and develop your corpus, machine learning can go one step further and remove the user from the equation while increasing confidence in your organization’s ability to identify, contextualize and classify its unique data

This is a departure from where we’ve been in the past, but it’s a necessary departure. Adding machine learning capabilities will reduce errors and result in more rapid adoption of a data protection strategy that’s unique to your organization.

A new way of thinking

This sounds revolutionary, doesn’t it? That may be so, but it also sounds familiar. Think about when Box and Dropbox entered the collaboration market. They fundamentally and forever changed the way we think of collaboration. Like that example, machine learning is not a feature or a singular solution. It is a market disruptor that marks a new way of thinking about data protection. If you don’t believe machine learning will change the way organizations protect sensitive data, you may well be left behind.

That said, it’s also not a ‘silver bullet’ that will solve all your organization’s data protection concerns. This is a new approach that’s still maturing and will continue to evolve over time as it becomes more ingrained in your overall approach to information security. Approaching this as the one true answer to your data protection challenges isn’t the right approach. Machine learning is a new way of thinking about data protection and a journey – a necessary one, at that.

A necessary journey … together

As we’ve talked to customers about TITUS Intelligent Protection and what machine learning can mean for their data protection initiatives, the biggest question we get is, “How do I start?” As I said, machine learning isn’t a destination, but a journey, and one that will require a partner. It requires you and your organization to buy into the fact that your data, especially your sensitive data, is unique to your business. It will take time to fine-tune models to fully understand and reflect this uniqueness.

I can tell you this from firsthand experience. TITUS has been on a machine learning journey for some time now. We know that starting small and starting fast will be the best way for your organization to feed its corpus and understand the context around all your unique data. We’ve had to think about what a corpus means for us, and what a policy looks like for us as we continue this journey.

I believe that because we’ve been on this journey, we can confidently partner with your organization to help you take advantage of this amazing technology. Our team has changed the way we think about data protection – we’ve disrupted ourselves! We’ve broadened our professional services offerings beyond our core data classification and data loss prevention services to include ways to help our customers build their models. We understand the work your organization needs to do because we’ve done it, too.

At this point, some of you may have concluded you’re not quite ready to deploy this type of technology within your organization, and that’s okay. But the pervasiveness and democratization of machine learning will continue whether your organization is ready or not, so if you can’t start today, you need to start thinking about when you’ll be ready because this is a journey you’ll need to take.

Excited? We are. This is the beginning of a revolutionary way to think about data protection.

Mark Cassetta, senior vice president of strategy, is responsible for the execution of product strategy at TITUS. His diverse background, including roles in marketing, business development, corporate strategy, applications development, and enterprise software, helps to inform his approach.


The impact of machine learning on how we live and work

October 16th, 2018

Charlie Drake has returned to the blog today to talk more about machine learning and the impact it’s going to have as it’s adopted in workplaces around the world. 

Machines aren’t learning about me, are they?

The truth is YES, THEY ARE. If you shop online, use any social media channels or use those handy auto shop scanners, or self-checkout machines, when you go to any large supermarkets to save you time, then machines and AI are learning about you every day!

I’ve been musing over all the different ways my life is already interacting with machines day to day and this then sparked a bit of research as to what the future might look like for me and for us all.

In what ways does technology make our life experiences better, transactions quicker and enrich our online interactions?

My last blog spoke a bit about the human side of machine learning and how embracing it could mean people will be able to guess less and focus more. But if you think about it, humans already spend a lot of our lives learning and ‘classifying,’ just in a slightly different way to he machines.

How humans classify based on data

Think back to when you were a child and how you built your own version of classification to keep you happy and safe. You’d use the various ‘data points’ you’d been fed, to make choices in your life. Your parents might tell you to not touch the oven when its hot, or hold their hand and wait for the green light before you cross the road.

You might have a negative experience with a local stray dog, which then causes you to classify all dogs as dangerous and you may still be scared of them today. Or watching those first few scary movies when you are a teenager could mean you now avoid all clowns like your life depended on it!

On a more positive side, you might get a thrill from the first few running races you take part in at school or realise you get a little thrill when you get a sum right. Classifying these things as ‘fun’ and, ultimately, those experiences or ‘classifications’ have more than likely led to your choice of career or favourite sport in adulthood.

And what about the machines?

If you think about it, the way machines learn and classify isn’t all that different to how we do it as humans, just with much less emotion and potential bias.

There are hundreds of examples I could give but here are just a few:

  • Netflix uses ML to guess your mood and recommend the movies or series that you’ll be most interested in.
  • Ever used Tinder? Well for all those who have found a Hot Match, thank you machine learning!
  • Amazon is another great example of how machines can learn the products, books and other goodies that might appeal and cause you to purchase more
  • You can even point your camera at a menu in another country and see the choices in your own language via the Google Translate app.

Much of our day-to-day technology is powered by AI and machine learning, so there’s no reason why the successes of ML in our personal worlds can’t be translated through to our lives at work.

Machine learning is like a dictionary

Think of it like the difference between a heavy old 1980’s dictionary tome which you had zero chance of ever reading back to back, compared to the online dictionary and spell checker programmed into all your office programmes: It seems like second nature but you now have the entire dictionary at your fingertips.

That’s old school machine learning: Working WITH humans to get the right outcome. Nowadays its about feeding the machine with as much information as possible and watch it support your businesses information management and make you more secure to boot.

Leading organisations are already using machine learning-based tools to automate decision processes and to support digital transformation.

It’ll be the biggest disruptor in the next few years and will definitely impact every workplace by:

  • Enhancing customer service by learning from both past and current interactions,
  • Helping companies hire the right people based on CV language analysis and removing human bias,
  • Detecting fraud by spotting things that humans might miss, and/or
  • Identifying challenges and issues in processes or ways of working.

Machine learning is here to stay and it will rock our world

If you could feed the machine thousands of examples of documents or e-mails that were standardly used in your business, it could easily use the ‘knowledge’ gained from this data to help you classify correctly and even teach you when you aren’t sure. It doesn’t remove the need for human decision making, which will always be needed. It gives you a view of how thousands of other documents have been classified before.

You can save big bucks by using machine learning to simplify ways of working or spotting needed changes to systems or processes. Because when you put great information in, you get great results out and could save thousands of people hours in your organization.

Machine learning is making tasks easier in our personal lives and increasingly in our professional lives as well, especially around data protection. Keep watching – the best is certainly yet to come.

Charlie Drake has had a diverse management career covering internal communications, employee engagement, leadership management, culture change and CSR. She’s developed a strong reputation as a creative leader who always makes a difference. She’s had roles at Aviva, Vodafone, Virgin Media and others where she’s turned complex strategies and processes into tangible, understandable stories for employees.


Challenging the status quo for data protection

October 10th, 2018

As cyber breaches continue across networks, endpoints, public and private clouds, public sector department need to think about a strategy that is inclusive of new approaches to protecting their mission-critical information.

Recently, I had the pleasure to speak at the Palo Alto Networks Federal Government Summit, that brought together leading technology companies in the cybersecurity industry to address this very topic.

While each company had a unique position based on their area of technology expertise and focus, a common thread emerged – an effective security strategy must be highly adaptive to  combat existing and new threats.

While governance and compliance regulations attempt to address a component of the problem to ensure adherence to security standards, this is just one aspect to a broader challenge to protect sensitive data.

One such new option is to establish a best-of-breed integrated data loss prevention strategy that is effective in both it’s design and capabilities. It is inclusive of technologies within an ecosystem that incorporates network firewalls, CASB and data classification solutions in addition to email encryption.

At the Summit, I introduced our partnership with Palo Alto Networks in the context of this integrated DLP offering which brings together TITUS Data Classification solutions and Palo Alto Next-Generation Security Platform.  TITUS identifies data in the work flow, to provide Palo Alto visibility into the sensitivity of the data flowing across the network and into the Cloud. Users and administrators can seamlessly identify and control the flow of sensitive data, meet stringent regulatory requirements, and optimize prevention policies to focus on the highest-risk areas, no matter where the information resides – across the desktop, on mobile devices, data centres, and in the Cloud.

A crucial aspect to this process is the role that employees need to play to ensure the data they work with everyday remains protected as it travels throughout the workflow.  For government agencies and corporations, this means that employees need to be engaged in a culture of security each day to prevent personal data from leaving the network, data leaks at data center levels and personal loss of data on mobile devices and insecure systems.

The need to accelerate the adoption of a comprehensive data protection strategy has never been greater. Nor has the need to constantly challenge the status quo; incorporating new processes and technologies into a flexible security framework, fundamental to ensuring our Federal Government remains on the front-line of cyber defense in a world where the next attempted data breach is just around the corner.

Have any questions regarding your organizations data protection plan? Click here to request a call back.

Mark Cassetta, senior vice president of product management and strategy, is responsible for the execution of product strategy at TITUS. His diverse background, including roles in marketing, business development, corporate strategy, applications development, and enterprise software, helps to inform his approach.


Information security: Data protection starts with classification

October 5th, 2018

by Scott Hubert

We’ve been saying for a long time that data classification is the foundation for data security. Because data classification helps you identify the data you have so you can give it the right level of protection.

Data protection is a hot topic right now, with data breaches and misuse of data making the headlines every week. People are concerned about how much of their data is being collected and used by organizations all over the world.

Those same organizations are hiring people who sign information security policies on day one of their employment. The problem is those policies aren’t always accompanied by tools and training that puts the policy into action.

We all know it’s important to keep data secure, but does everyone know what it takes to make that happen?

The importance of information security

It’s become an unquestioned business imperative to protect data of individuals, customers, employees, along with internal sensitive information. GDPR and other regulatory requirements have established standards based on the expectations people have for how their data is handled.

We used to be concerned about the corporate perimeter, but that perimeter has shrunk to the encompass the data itself. And organizations that aren’t evolving risk their reputation and their very existence.

Data security starts with context

Unstructured data – the documents and emails we work with every day – are a smorgasbord of juicy information just waiting to be used for good. Or bad, depending on who has access.

Classifying those documents gives them the context required for security policies to kick in and trigger the right level of protection, ensuring organizational data has the right protection applied.

Enhance security with machine learning

Lately, data security vendors are abuzz with the advantages machine learning adds to the security ecosystem. Imagine creating documents, saving them and knowing the data is accurately and consistently classified and the appropriate protection is applied seamlessly – all within an automated workflow, enhanced by machine learning.

Machine learning brings an intelligent level of protection to organizations so data is protected, and time is saved on the front and back ends of the process.

Protect your data, protect your organization

You can only adequately protect the data you know you have. That’s why data classification is the foundation of data security. It provides context for reporting, triggering the right policies at the right time to keep your organization from facing the ultimate risk to trust: a data breach.

Recently, Jim Barkdoll, CEO of TITUS talked about this with Varun Haran, senior editor with Information Security Media Group. Listen to the full interview to learn more about how data classification can augment your data security strategy.

Scott Hubert, director of product management at TITUS, has a diverse background managing people, projects and process in the development and delivery of data security solutions.


Top 3 data classification use cases to build a foundation for data protection

September 28th, 2018

Picture of "restricted information" and "confidential" stamps

by Matt Luckett

Today’s organizations have many tools at their disposal to help workers be more productive, but those tools come at a price. Data is increasing exponentially, and organizations have a responsibility to handle and protect it appropriately – from collected data to internal documents that contain sensitive information.

I’ve worked with many customers and prospects over the years and I’ve learned a lot about how valuable data classification is for building a foundation to protect any organization’s data.

The following three data classification use cases show how organizations can leverage classification to ensure data is protected from creation to deletion.

1) Identify the data you have and where it’s stored

Data identification works like an inventory of valuables for insurance. The data is appraised to determine its value. Is it internal or even restricted content? Or is it content that’s for customer consumption?

By identifying the value of what you have and knowing where it’s stored, you can take the right steps to protect it based on its value to the business, starting with classifying the data.

2) Empower your data security ecosystem

The demands of data protection today require significant investments in time and tools for organizations to meet compliance regulations and maintain the trust of consumers. That means using a variety of tools to build a secure infrastructure that covers the various platforms that enable productivity in your organization.

Data classification helps inform the other tools in your security program with embedded metadata, so they properly handle data.

3) Enable users to be part of the security solution

Technology doesn’t replace the need for human input. Giving users a tool that helps them stop and think about the value of the data they’re handling every day spreads the accountability across your organization.

People know their data best, so who better to take that first step of identifying and classifying the data to be protected through its lifecycle?

Learn more about how you can leverage data classification to protect your data

Join me for a webinar on October 4 at 1 p.m. EDT to learn how you can use data classification to strengthen your organization’s approach to information and data security.

Garrett Bekker, Principal Analyst, Information Security at 451 Research will join me to show how to build a business case for data classification so you can win the support of your executives, business unit leaders and employees.

Register to attend the webinar now!

Matt Luckett is a business and systems analyst who collaborates with our clients to ensure their business needs to protect data are fulfilled throughout the process.


The challenge and process of data protection by design and default

September 25th, 2018

Today, we welcome guest contributor, Allan Boardman, to the TITUS blog. Mr. Boardman is an active member of ISACA and has extensive experience in the areas of information risk management, security, and data protection.  This post is the first in a series focused on compliance requirements and how organisations can get out of the rut of playing catch-up to emerging regulations and establish future-focused security programs that build trust while meeting the needs of the business.

The European Union’s General Data Protection Regulation (GDPR), which became effective in May 2018, is designed to unify data privacy requirements across the European Union (EU). It affects any organisation that markets to or processes the information of EU Data Subjects, which include end users, customers, and employees.  Whilst organisations across Europe and wider afield have been scrambling to ensure that they conform to the GDPR rules, their work is far from done. That’s the not-so-good news. The better news is that the requirements of GDPR give organisations a blueprint to start thinking differently, and smarter, about how to build data protection into their processes by design and default.

What are the challenges to protecting data by design and default?

The sheer volume of data is a challenge, but that barely scratches the surface of the factors to be considered.

  • Who are the owners of data?
  • What data do you have?
  • Where does it handled, processed, stored, archived?
  • Who do you share it with and how do they get access to it
  • What is it – sensitive or non-sensitive?
  • How do you protect each category of data appropriately?
  • Who’s using it and needs access?
  • How do you help people understand the stakes and their role in data protection?

These are complex questions but you can’t build an effective data protection program without knowing the answers. You need real clarity about what you need to protect and how you’ll classify personal data right from the start. But how do you go about doing this?

1) Incorporate data protection design into your change management process

It’s simpler to work through data protection design requirements when you’re implementing new tools and processes. But many companies are working to retrofit new requirements into existing systems. It’s critical to get the right people at the table during the design phase. GDPR requires a data protection impact assessment, so engaging data owners and users in the process will help identify threats and risks that need to be addressed in the process.

Data owners and users can provide invaluable insights into the state of data – what you have, how it’s used. And an information security expert can help you understand and navigate the regulatory demands while giving insight to the best ways to protect your data.

It can seem like a great idea to protect everything the same way or encrypt it all. However, that doesn’t necessarily match the level of protection to the level of risk. By involving the right people in the planning and design phase, engaging the business and educating employees, you will begin to build a trusting relationship between your security team and your business units. And you’ll build a data security program designed to protect data appropriately at every stage.

2) Address data classification early and thoroughly

The business must be fully engaged in the threat identification and data classification stages or you won’t get the full benefits of a classification system or schema. Why? Because data classification provides a foundation for you to identify the data you have and its value to your business. The data classification piece needs to be clearly understood and fully addressed right from the start as this creates an important foundation for driving the level of protection required throughout the lifecycle of the data.

It is also important to clearly differentiate between specific categories of data including personal data because the associated risks may be different. And keep in mind that the value of data could be related to the impact of losing it rather than the business value of the data itself.

As a general rule, you should establish and have clear data classification rules without getting too granular. Know what you need to protect and how you want to handle that data within the framework of your policies.

3) Educate people to make proper judgment calls and informed decisions

For data classification to be successfully implemented, it is essential that the users are provided with clear guidelines and instructions which are easy to follow and understand. People need to understand what’s at stake so they have the context to make the right decisions when they’re handling data. Doing this effectively involves a combination of ongoing awareness, regular training, and comprehensive education.

This provides clarity of roles and responsibilities in protecting personal data. And the relationships established in the design phase with the security team will help to ensure that they will more easily communicate risks and help seek out solutions. The outcome? Your organisation will become more effective at being able to meet its compliance requirements and managing its information risks in a dynamic and evolving business environment.

The foundation of knowledge about the regulations, the types of data being collected and used in the organisation, and the risks associated with handling data make technological solutions for data security more effective too. Organisations can help people put that knowledge into action every day in the flow of work with the right tools. It takes those written policies that we’ve all signed off on and makes them come alive and in so doing empowers people to follow-through in their work.

Remember, start with intentional data protection by design and default

When you design your systems for data handling through the lens of protecting the data first, it sets your organisation up to be on the right path to establish to set up people, processes and technologies for success.

Allan Boardman, CISA, CISM, CGEIT, CRISC, CISSP, is a seasoned business advisor focusing on information risk management, security, and data protection. He has served on ISACA International’s Board and is a regular speaker at conferences across the globe.


The future of data protection and machine learning: Guess less and focus more

September 18th, 2018

We’re thrilled to welcome Charlie Drake as a guest contributor to the TITUS blog, where she’s kicking off a series of blog posts on how to help your people understand and get engaged in the process of securing data – even as more and more organizations are adding machine learning and other artificial intelligence capabilities to their security arsenal. Charlie is a creative communicator whose work influences cultures and helps change mindsets. She has a passion for people and the role they play in enhancing the impact of technology.

Every one of us has been at that point where it’s all got a bit too much and you can feel the pull to just get off the merry go round of your life to just stop for a bit, reflect, and take stock.

And I’ve been doing just that lately, pondering over lots of things during my time off. I’ve been thinking about the future a lot.

What will happen from a political, environmental and technological standpoint? And how will this affect me and those I care about?

Although I haven’t been thinking about work as much as possible, I do keep coming back to what the workplace of the future looks like and what it will feel like for all of us.

The impact of technology on the workplace

It won’t surprise any of you that, as employees, we all have much bigger and more complicated roles than we ever did in the past.

Technology has transformed the workplace in the last two decades. Many modern startup businesses seem to be jumping on that roller-coaster and embracing the thrill of these new ways of working and being effective. But it also feels like many other, more traditional businesses could be being left behind.

You know what else became really clear to me as I was pondering this? Just how important it’s going to be for businesses to invest in making sure their people not only understand how significant things like artificial intelligence (AI) and machine learning will be in every role and process very soon, but also how critical it is that they embrace and accept it as the helping hand it really is.

It’s human nature to panic about change

I’ve always been fascinated with how people think and what drives them to behave in certain ways. It’s why I’ve always had a knack for change communications and campaigns around security and ethics.

People will quite often naturally panic about new technologies because they seem too ‘out there’ or scientific, which then moves to them thinking about films like Terminator or AI and how machines or technology could take over – funny thing, the human mind!

Now that machine learning (ML) is being adopted and played with by pretty much every company (whether you’re aware of it as an employee or not), we’re moving into a brave new world which will be an awesome rollercoaster ride for the who embrace it and more of a house of horrors for those who don’t.

The business benefits of machine learning

Machine learning is transforming how companies deal with fraud/risk indicators, saving companies millions. It’s helping to cross-sell relevant products to customers based upon what its learnt about their behaviour: How do you think Amazon or Netflix know what you want to buy or watch before you do?

And in the workplace, ML is being used for data classification.

I’ve introduced classification technology in two businesses and I can really see it changing how we assess and classify information, helping us all become more secure by allowing employees to guess less and focus more. Leading companies in this space, such as TITUS, are embracing machine learning and leading the way in taking it to the next level to support their customers.

Machine learning allows people to guess less and focus more

It’s safe to say most employees view essential, everyday tasks like the classification of data as a pain or an unnecessary task for them to do. (Don’t even get me started on GDPR!)

When someone is busy, overwhelmed or even disengaged, they are much more likely to make quick decisions and potentially classify wrongly or not even bother changing the default to get that email out or report finished. This will most certainly lead to breaches or other issues later down the line.

Trust in technology to help minimize risk

So, what if you could confidently trust the intelligence of a system that has interrogated and understood the classification parameters of your business? What if that system has already read hundreds of thousands of documents produced by your employees?

Once learnt by the technology, the system can classify documents automatically for you and even flag to you if you’ve classified something wrongly based upon its content.

I believe that very soon, the machine learning technology being developed by companies like TITUS will take the stress out of data classification and give strong reassurance to execs that their information is being protected.

Now, wouldn’t that make life easier?

Charlie Drake has had a diverse management career covering internal communications, employee engagement, leadership management, culture change and CSR. She’s developed a strong reputation as a creative leader who always makes a difference. She’s had roles at Aviva, Vodafone, Virgin Media and others where she’s turned complex strategies and processes into tangible, understandable stories for employees.


Deconstructing Three Machine Learning Myths

August 30th, 2018

by Mark Cassetta

Last week, I wrote about how machine learning can be practically applied to data classification and protection. There’s no denying the fact that machine learning can help you identify and protect sensitive information. It reduces file analysis to seconds or minutes, and gives people more time to analyze results and determine what needs to be done next.

However, there are many who believe that machine learning is a silver bullet solution to their security problems. But as Raffael Marty, VP of Corporate Development at Forcepoint and one of the leading industry experts on big data and analytics points out, the risks of relying heavily on technology is dangerous and can even create a false sense of security.

So, how can machine learning help you keep your organization’s most sensitive information secure?

The first step is to break down three common machine learning myths.

Machine Learning Myth #1: Machines just know what sensitive information is.

As business leaders, we assume our most sensitive information is locked down. And if a piece of sensitive information is leaked, we can simply push a few buttons to find out where it is, and bring it back to safety.

That’s a myth.

The use of full automation certainly has its place in areas such as detecting anomalies or operating within a concrete set of rules. But most organizations don’t have mature data management practices.

Less than 1% of unstructured data is analyzed or used at all and people have access to more data than they should. The direct identification of sensitive material is too difficult for an algorithm to learn on its own, especially when it can’t factor in context or nuance without prior knowledge.

Tip: Make sure everyone in your organization understands what classification or category needs to be attached to a document based on the type of information it contains.

Machine Learning Myth #2: Machine learning will work wonders with tons of data.

Just like any company that has deployed a CRM program, it’s all about the quality of the data that goes in rather than the volume of it. It’s no different for information security and, as the old saying goes, “garbage in, garbage out!”

It takes time and resources to collect, analyze, double and triple check, and finally prepare a really good and accurate data set, based on the context of its regular usage, to train your algorithms.

That’s the most practical way to get machine learning to precisely derive the information you have and help you determine what you need to do to protect it.

Without reliable training data sets, you run the risk of frustrating users as well as data stewards or security analysts. If your users get frustrated, you’ll face a constant uphill battle to get them involved and support your security program, no matter how much promise a piece of technology has to make their lives easier.

Tip: Bring in the right stakeholders from across your business to understand what type of data they have and what their policies are for information that is created, shared, stored, and deleted.

Machine Learning Myth #3: You can build and train an algorithm once, and it will do the rest on its own.

Data hoarding is a real problem, but the sensitivity of information changes all the time. What might be classified as sensitive today may be eventually become public information down the road because context changes. This is why it’s important to separate “what” something is (its category) and “how” it should be handled (based on its sensitivity).

Forrester Research’s July 2018 report entitled Rethinking Data Discovery And Classification Strategies refers to this as “dynamic data classification”, which currently requires employees and tools for automation and enforcement for information security.

With the context evolving so quickly today, it’s not practical to assume algorithms can learn and identify sensitive information unique to your organization on their own. Like humans, improvement requires re-assessment and we need methods/processes for that to take place.

Tip: Schedule reviews and have feedback mechanisms in place to review your information security policies specifically for data classification, information lifecycle, and data governance.

It’s not about adding more layers to security

Technology will continue to have a tremendous impact on how we run our businesses and keep information secure. But it’s clear we need to adapt our thinking and our approach to how we protect data.

It all starts with how we define data in this digital age: Data isn’t merely documents that we create and use for a single use, so why do we treat it that way?

People will always be a top security risk for data loss and breaches – that’s just reality. So, the solution isn’t to add more layers of security for the sake of it. The solution is to have those layers work together – and this includes machine learning – to strengthen your information security program that makes the most sense for your business.

Mark Cassetta, senior vice president of product management and strategy, is responsible for the execution of product strategy at TITUS. His diverse background, including roles in marketing, business development, corporate strategy, applications development, and enterprise software, helps to inform his approach.


A Hype-Free Look at Machine Learning and Information Security

August 24th, 2018

Machine learning concept image: Person holding tablet with illustration of automated connections

by Mark Cassetta

There’s no doubt that machine learning is having a significant impact on information security. But how can it be applied to data protection? What types of algorithms are the best to use? Can’t organizations simply buy a third-party data set or model?

These might seem like simple questions to answer. And yet, there still appears to be some confusion about what is and what isn’t possible with Artificial Intelligence and machine learning, particularly in cybersecurity. Part of the challenge is that terms like AI, machine learning, deep learning, automation, and others are used interchangeably. The fact is they’re not all the same. As a result, their application and expected outcomes for data protection need to be given the proper context.

So, I’d like to offer a practical, hype-free look at how machine learning can help with data classification and protection.

Let’s first define machine learning.

What is machine learning?

It’s the use of algorithms to learn from and make predictions based on data. The ultimate goal is that with enough representable and quality data, machine learning algorithms can learn on their own so they end up identifying or finding patterns in new or existing data quicker, enhancing our ability to make decisions.

When it comes to data classification and protection, there are several different approaches and algorithms that can be applied to process data, perform a calculation, and then carry out automated tasks or make suggestions.

This presents another point of confusion in the security market. What is the best type of algorithm for data classification? Can’t algorithms just determine what’s sensitive and what isn’t? Can deep learning systems properly categorize unlabeled data?

The challenge with sensitive data

Here’s the thing: There is no universal definition of what is sensitive and what isn’t. Sure, we know a few things are sensitive, like social security or social insurance numbers, banking information, and personal health information. But what about business plans, product documentation, intellectual property, and other types of information people deal with on a regular basis?

This is when the definitions start to blur. That’s why context is so important for data classification and why the combination of people and technology is the best way to keep information protected today.

Why machines need context

Content-based categorization, which uses the words and phrases in documents to build machine learning models, helps distinguish categories of documents. This approach leverages the knowledge that people have of their business, its processes, the types of data it has, where it’s located, and what the specific security policies are, so algorithms learn and can adapt to specific business needs.

The only way to enhance the accuracy of your use of machine learning is for you and the people in your organization to decide what the categories are and provide representable document examples for these categories. That way, you can train models with the best source of information that’s specific to your business, how it documents that information, which brings in the most important piece – context. So, when a new document is provided, your system will pick up words and phrases it contains that indicate the category it is most like.

Let’s look at an example.

A practical example of machine learning to data classification

Let’s say you have the following document categories:

  • Category A: Design document
  • Category B: Finance report
  • Category C: Public announcement

And you now wanted to assign each of these categories the following classification:

  • Category A: Design document – Internal classification
  • Category B: Finance report – Restricted classification
  • Category C: Public announcement – Public classification

The next step would be to create a collection of these documents to train your algorithm. You can apply a policy that says, “If a document is categorized as a Design document with a probability of 80%, then automatically classify it as Internal.”

The accuracy of machine learning relies on the quality of the training data, which relies on the uniqueness of words found in the training data categories. Algorithms try to find the patterns that distinctively define a category. Therefore, it’s important to provide enough example documents to train the algorithm because accuracy will suffer if the examples differ from real-world context.

At the end of the day, you want a more consistent, accurate, and efficient way to identify and secure data. You can only do that if you spend time developing and training data set(s) based on the categories you want to identify. These categories should reflect the types of documents and emails you wish to categorize based on the types of data your organization handles. Remember, your definition of what is and isn’t sensitive is unique to your organization.

Machine learning is about outcomes

As much as we want people to be our strongest security link, they can benefit from the assistance machine learning provides.

In fact, the lack of consistent identification, classification, and protection for similar types of data is a security risk. Research from the Ponemon Institute and ObserveIT found that most insider threats were caused by insider negligence, specifically careless employees or contractors. These types of data breaches cost companies an average of $283,281 to deal with.

Until third-party data sets improve – and we know they will – the combination of people and technology is needed to enhance the level of accuracy of data protection and build the trust that machines can act on our behalf with precision.

Like anything in business, the best ROI for any piece of technology is the outcomes it achieves. When it comes to machine learning, the outcome is simple: Improve the accuracy and efficiency of data identification and classification, so you can reduce the risk of data loss.

Mark Cassetta, senior vice president of product management and strategy, is responsible for the execution of product strategy at TITUS. His diverse background, including roles in marketing, business development, corporate strategy, applications development, and enterprise software, helps to inform his approach.


Personal data protection in 2018 and beyond: Q&A with Doug Snow

August 2nd, 2018

Recently, TITUS hosted an ISACA webinar, where Doug Snow, vice president of customer success discussed how to achieve sustained GDPR compliance. Doug provided a number of ways organizations can get executive buy-in and sponsorship, engage and empower end users through change management, and tips for data discovery and mapping.

The audience asked so many great questions that we didn’t have time to answer them all, so we sat down with Doug to address some of the most common questions here.

Do you foresee a GDPR-like regulation being put into place in the US?

Yes, you can count on it. We’re already seeing increased data privacy and protection regulations coming into play in the US: The New York Department of Financial Services (NYDFS) Cybersecurity regulation, and the California Consumer Privacy Act (CCPA) have been introduced and the CCPA went through the legislature in record time. Although we may not see things move as quickly at the national level, there is certainly discussion around the importance of personal data protection.

GDPR set the standard for personal data protection regulations, and the emerging regulations globally reflect the same functional requirements.

With the increasing presence of automation, what are some of the core concerns for data security?

Automation plays an important role in all information security practices, including classification, so its entirely possible to categorize metadata objects based on context, such as your directory or information about the type of file. However, it’s important to involve end users because there are many instances where a machine cannot accurately determine the sensitivity of the material.

You will absolutely be putting more automation into your data security programs, but you can’t take the human element out of the picture completely. As I discussed in the webinar, humans will be integral to refining the algorithms being used as people make decisions about sensitivity of data and lead to fewer false positives.

Why can’t an organization encrypt everything? Would that not lead to GDPR compliance?

If we could encrypt everything and still get work done, it would have been done already. We can’t do it yet. During the webinar, I mentioned a great paper, “Why Johnny Still, Still Can’t Encrypt: Evaluating the Usability of a Modern PGP Client”, that outlines the challenge in sharing the key to encrypted data.

Usability of encryption across systems is still challenging and, given that we invested in information security to enable business to move faster, the last thing you want is to introduce the frustrations with trying to handle encrypted or locked documents. You’ve lost the value of electronic information, the speed of information technology and you’ve impaired the business. So, while encryption is the first thing that comes to mind for a lot of folks, it’s not a practical solution to solve the whole problem. You must encrypt the sensitive material, but only the sensitive material and you need to know the classification first to accomplish that action.

Is there a trust badge for companies that can be shared on their platforms or websites to state that they are GDPR compliant?

I think that’s a brilliant idea to be able to have a third-party authority that can measure your trustworthiness. After all, it’s a competitive differentiator. But the exact definition of GDPR compliance is still evolving so it might not be possible to achieve that badge yet in today’s world.

Perhaps in the future there will be auditors that can validate that you’ve implemented classification, controls, and cultural change (all reportable), that can be tidily displayed in a scorecard or some attestation of your organization’s level of trustworthiness.  I would start with a maturity model. There are many emerging around privacy, including the CMMI Cybersecurity Maturity Model. As you advance in maturity, you can earn badges.

How do I commence the process of effective data classification in my organization?

The most important thing to do is to have all end users start classifying newly created and recently accessed content from this day forward. It isn’t practical or physically achievable to freeze frame an organization and try to discover all its content, the meaning of the content, and apply a classification level to it. The grounds will be shifting under your feet as users are creating new content 24/7 around the globe.

Deploying a tool like TITUS Data Classification is done in conjunction with a few other steps:

You need to have an agreed-upon classification schema that’s been accepted as policy inside the organization and shared across business units. This should be part of an information security policy that addresses the handling procedures and provides guidance on the controls you can put in place.

Most importantly, though, is communication with your user base. Let them know the importance of classification, how you’ll be using it, and what the benefits are to them and the business. This really is a change management initiative – driving a security culture, with privacy built in by design and default.

Based on your experience, what is the duration of a data-mapping classification program, and what are the pitfalls?

Like any project, the speed a project gets deployed depends on the ability to make decisions. Once you have your decisions made, you know what you want your classification schema to look like, and you have the right approvals, deploying a tool can be as fast as you can physically get it out there.

In terms of pitfalls, one of the most important things to watch for is the dependency on default classifications. It’s tempting not to involve end users so many organizations move to a default classification. The challenge with that is that you’ll end up classifying all of your content with that same default value. Going this route means missing many opportunities: Culture change, education, and accurately mapping the content to the right categories to ensure appropriate protection.

The second biggest pitfall is not having an executive sponsor and failing to leverage a change management program. You need both so your organization can make the important cultural and behavioural changes to protect personal data and comply with GDPR, leveraging whatever technology you choose.

Data protection doesn’t have to be complicated

But it does need a thoughtful approach that involves taking the right steps at the right time. If you missed the ISACA webinar, you can watch the recording here. And be sure to leave us a comment if you have any questions we didn’t cover.