Structured vs unstructured data management
Successful use of data is an progressively growing priority for businesses, with databases and documents of all types analysed for trends and information that may offer that vital edge in competition. Productively carrying out this type of analytics is occasionally easier said than done, though, not least because of the diversity of data sources out there. IT is too easy to think of granular level
when thinking about data types SQL, NoSQL, Excel, Oracle databases and more. However, it is probably better to take a more high-level view and reflect on whether the data in question is structured or unstructured, as it’s this, more than the format in which it’s held, that can have the most effect on how it’s managed and analysed. Consequently, it’s worth pondering whether your data is structured, unstructured, semi-structured or a combination of all three, and then determining how best to administer it. What is structured data and how is it managed? Structured data is often what first comes to mind when you think of both data in general and Big Data analytics. This is the type of information that can be stored in traditional databases composed of columns and rows and is also known as relational data. A customer database comprising names, addresses, telephone numbers, order frequency and type, and so on is an illustration of structured data. Likewise, a database for clinical trials encompassing demographic data, whether a patient is on a placebo or the real treatment, dosage and impact would also be structured data. SEE MORE What is SQL? SEE MORE What is SAP HANA? SEE MORE What is a relational database? To an extent, by its very nature, structured data is already “managed” it’s kept in an orderly fashion in a single location. Another layer of management can be added to this, however, in the form of a relational database management system (RDBMS). These systems allow users to create, update and administer relational i.e. structured databases. The majority are written in the open source SQL language, or a variant thereof like MySQL. A notable exception is Oracle’s database system, Oracle DB, which is proprietary software that’s particularly popular for managing large datasets and as such is often found being used by the financial services sector. While we won’t be discussing it in depth here, it’s also worth noting that an RDBMS is often embedded in products that also offer far more bells and whistles than just managing data and making it available to queries. For example, Salesforce, the cloud-based customer relationship management (CRM) platform, manages the structured data put into it, but also offers tools like chat, access to the Force.com development platform, analytics and so on. So depending on your needs, it may be worth looking for more than a bare RDBMS. What is unstructured data and how is it managed? Unstructured data is anything that can’t be organised into a structured database. Common examples are free-flowing text-based interactions, such as email conversations or chat logs, word processing documents, slideshow presentations, image libraries, or videos. While this may not look how you would imagine data to at first, it makes up over 80% of data in existence and often offers a wealth of useful information. Together with structured data, it’s also one of the three Vs of Big Data variety (the other two being velocity and volume). Unstructured data is more difficult to manage than unstructured data as it doesn’t have a uniform format, even if the data source is the same. Indeed, managing it in the way structured data is managed is something of a novel idea, as it’s only been feasible to mine it for information since Big Data analytics and AI have taken off. Unstructured data management (UDM) is essential for successfully making use of all this data. Rather than there being a handful of tools to point to for UDM, there are instead some basic tenets to be followed. Indexing Sometimes also called “discovering” and other related terms, this involves compiling your data to see what’s there, how long it’s existed, how frequently it’s accessed and so on. The aim of this is to determine if it’s likely to bring future value to the business and therefore worth archiving and putting in a UDM system. This is a long process it can take weeks to scan and sift all this information, so be prepared to put in a fair amount of time and effort at this stage. It’s also the point at which metatags should be added, to ensure that the information is easily searchable later on. Storage and availability With the data sorted, it now needs to be stored in a suitable location with attributes that make it easily and automatically accessible. Storage locations include general cloud storage such as Microsoft Azure or AWS S3 or on-premise data lakes. These both allow the information to be stored in its “natural” state - which is to say there’s no need to try and put it into database format - and also make it available for automated querying through APIs. When considering which type of storage to use, it’s also worth taking into account how frequently the data being stored is accessed. If it’s relatively infrequent, it should probably be put into “cold” storage, which is frequently much cheaper than if it’s kept readily accessible at all times - although is slower to access initially when you finally do need to query it.
Date: 2019-09-30
URL: http://feeds.itpro.co.uk/~r/ITPro/Today/~3/eFlGFSvPpHg/structured-vs-unstructured-data-management
itpro.co.uk
Microsoft Surface Laptop Go review gallery (2020-11-03) | The Surface Laptop Go shaves some weight cost and features off the core Surface Laptop design |
What is the dark web? (2018-10-15) | The portion of the internet that isnt visible not readily accessible using conventional methods of search is commonly known as the dark web and is just as ominous as its branding suggests This hidden portion of the web can only be reached using a virtual private network VPN or a specific web browser such as The Onion Router Tor Although there are platforms and sites positioned in the dark web that.. |
HP EliteDesk 705 G5 Desktop Mini review: Small in size, big in value (2020-11-09) | Were big fans of small computers and between the Raspberry Pi 400 and this diminutive desktop weve been spoilt for choice recently Unlike the Pi however the 5th-generation HP EliteDesk 705 is unabashedly enterprise-focused setting its sights solely on large-scale business deployments with a number of IT-friendly features HP EliteDesk 705 G5 review: Design The EliteDesk 705 is a small form-factor S.. HP EliteDesk 705 G5 Desktop Mini review: Small in size, big in value |
EU inches closer to ban on end-to-end encryption (2020-11-09) | The Council ofthe European Unionappears to have a near-completed resolution that would propose a ban on the use of end-to-end encryption on off-the-shelf apps such as WhatsApp and Signal according to a leaked document The memo dated 6 November and addressed to representatives from EU member states reveals that strong encryption remains a priority for lawmakers but that the availability of end-to-e.. |
What’s the difference between active and passive reconnaissance? (2019-09-27) | Hacking is a profession that requires lots of preparation It isnt a case of selecting a target and hitting them with whatever malware youve got - its far more nuanced Pentesters and malicious attackers need to know how best to hit anorganisation including how togainaccess to theirnetworks without being caught and when the right time to strike is This information will only be gleanedfrom thorough r.. |
Ministers seek powers to block foreign takeovers retrospectively (2020-11-11) | A new bill that could potentially give the government powers to retrospectively block foreign takeovers of UK companies is being passed through the House of Commons on Wednesday The new legislation will update the governments current powers which are almost 20years old With the National Security and Investment Bill ministers will be able to fully scrutinise interest from overseas impose conditions.. Ministers seek powers to block foreign takeovers retrospectively |
Samsung Galaxy A90 5G review gallery (2020-12-02) | The most expensive A-series Samsung phone gets a huge thumbs up from us |
How to automate your infrastructure with Ansible (2020-12-02) | Hands up if youve ever encountered this problem: you set up an environment on a server somewhere and along the way you made countless web searches to solve a myriad of small problems By the time youre done youve already forgotten most of the problems you encountered and what you did to solve them In six months you have to set it all up again on another server repeating each painstaking step and re.. How to automate your infrastructure with Ansible |
Adobe buys marketing workflow startup Workfront for $1.5 billion (2020-11-10) | Adobe has announced its intent to acquire Workfront for $15 billion 11 billion as it looks to add collaboration tools to its marketing business The dealis expected to close during the first quarter of Adobes 2021 fiscal year subject to regulatory approval Adobe said SEE MORE Best email marketing software SEE MORE The coronavirus pandemic has transformed the workplace more than any disruptive techn.. |
IBM acquires cloud app monitoring service Instana (2020-11-19) | IBM has said it has reached an agreementto acquirecloud application management startupInstana for an undisclosed sum The Chicago-based companys main product is a service that canmonitor the performance of complex cloud applications over both public and private environments on-premise and mobile devices It has an observability platform that can analyse cloud applications to both prevent and fix IT .. IBM acquires cloud app monitoring service Instana |