Optical Character Recognition is one of those technologies which has gone from being just-about-possible at great expense and hassle, to so mainstream that people just assume it will work flawlessly, all in a relatively few years. Numerous companies offer OCR services or addins to line-of-business systems which help to prepare printed data for easier consumption – scanning invoices for example.
Consumers tend to use OCR in other ways; combined with language translation, you can point your phone at a foreign menu or sign and it may be able to help you understand. In OneNote, if you have captured an image (maybe through the clipper addin from your browser), then it can extract the text from that picture – not always perfectly, and not necessarily well-formatted, but it’s probably quicker than re-typing everything.
Near OCR functionality is also pervading the slew of freely available Office apps for Android tablets, phones and even Chromebooks, and similar versions for iPad and iPhone.
A recent addition to the iOS version of Excel is the ability to scan a table of printed data and use OCR plus a bit of tweaking, to import the data into the spreadsheet. See more here. The same functionality was first made available on Android a couple of months earlier …
Start with the grid capture icon on the toolbar of a new spreadsheet, and then use the camera to highlight the area of a document that you’re interested in – the UI will be familiar to anyone who uses Office Lens, as the same anti-skewing technology is used to prepare the “document” for importing.
Then the OCR goes to work and tries to lay out the data as closely as possible to its source – obviously, your accuracy will be improved by having a well-lit and clear original document, and you’ll get to tweak the contents in context of seeing the OCR’d data and the scan at the same time, before committing to insert it.
Tip o’ the Week 417 – Resolving aliases in bulk
|This might be a very old-Microsoft culture thing, but alias names have always been a relatively big deal within the company; not an alias in the sense of a nom de plume or some alter ego, but a name curiously given to mean your login name.
Before enlightenment, Microsofties were emailed simply by sending to email@example.com – and still are, so even if the primary mail address is firstname.lastname@example.org, you could still mail them at email@example.com, or whatever their alias is.
The alias, or logon name, at Microsoft was historically an up-to-7- or 8-letter moniker, based by default on surname|first-letter-of-forename, eg billg or steveb.
In a company with a handful of people, it was easy to remember such a name for when you wanted to drop them an email, but with hundreds of thousands of mail addresses, you might need more room – when Exchange Server came out in 1996, it supported 64 characters in the alias name, though oddly, Microsoft has never embraced longer than 8-character aliases.
Back in the day, your mailbox was a folder on a Xenix server, then an MS Mail postoffice, and the folder names were restricted by the 8.3 filename format. There are probably too many legacy systems that also have an employee name represented by their 8-letter alias, and it still kinda works.
Aliases – and the cultural phenomenon of calling people not by their name, but by their alias… eg “we had a billg review” – were a central part of the 1995 Douglas Coupland book, Microserfs.
Some people at Microsoft still talk about an email distribution list as an “alias” – eg. “TAKE ME OFF THIS ALIAS!!” as a Reply-All (as opposed to a little “r”) to the occasional mail storms that amazingly still happen. They’re wrong – those are Distribution Lists (DLs) or maybe more correctly, Distribution Groups (DGs).
But the true “alias” lives on, even if the Skypey “Contact Card” UI in Outlook does its best to not show you what someone’s alias is (but you can usually still get to Open Outlook Properties, which shows you the traditional Outlook address book view, with alias in the very top section). Lots of reports from Microsoft’s internal systems will refer to an employee using their alias name, so it often helps if you can decipher an alias into the person behind it.
Resolving an alias to a name one-at-a-time is all very well, but when looking at a column of alias names in some spreadsheet, it’s a bit of a palaver to turn each of the FORENAMS into something meaningful.
Bulk alias resolver in Excel
Fear not, worthy reader, for a solution is to hand.
This can be handy if you’re building Excel reports and want to add names to a table instead of aliases – you could sort the list of aliases alphabetically, run them through the resolver, and then reference the table with a VLOOKUP formula so you could hide the column of aliases from your report and show instead the derived real names.
Tip o’ the Week 396 – Handling dates in Office apps
There are many times when you may need to deal with dates in ordinary applications – and there are a few shortcuts that you can make it easier.
In Outlook, any time there’s a date field (like when you’re setting a reminder, or entering the start date/time for an appointment) you can choose or enter a regular date, or put in an expression – like “2 days” or “next Tuesday” – and Outlook will figure out the offset from today, and will set the appropriate date.
In some date fields (like an appointment start time), if you say “4 days” then press enter or TAB, it will evaluate the new date; if you return and put “4 days” again, it may add those extra days to the last date. Try a few other things like “next Christmas”, “3rd Sunday in November”, “2mo” , “7d” or some special days – there are some surprising ones there, like “Lincoln’s birthday”, and other events with static dates … though nothing that might change the actual date from year to year (like Easter, or Thanksgiving).
In Excel, press CTRL+; to insert the current date into any cell – add a SHIFT key to insert the time instead. Excel are many date-oriented functions, but you don’t always need to write functions – simple maths can work on date fields – calculating the number of days’ difference between two dates, for example, or adding a number of days to a start date.
In the desktop OneNote app, if you want to edit the date and time at the top of a page, click on the field and you’ll see a clock or calendar icon appear next to it – click on that is set to, click on that to change the value; handy if you’re updating some reference material and want to make it clear that it’s recent.
Another way might be to insert the current date or time into the text: to do so, press SHIFT-ALT-D, or SHIFT-ALT-T for the current time, or SHIFT-ALT-F for the current date and time. The last one is really handy if you’re taking notes about a phone call, and want to quickly note the time that your insurance company said that everything was all fine, or when you started the indefinite call to the airline. The same shortcuts apply to the desktop OneNote 2016 application and also the OneNote store app.
Word also supports SHIFT-ALT-D and SHIFT-ALT-T like OneNote, though inserts a date or time field rather than a simple bit of text, and is slightly different to the Date & Time command on the Insert tab, which gives a bit more control over the formatting at the point of insertion, rather than requiring the user to insert the field then go back in to edit the format.
Since Outlook uses Word as its text editor behind the scenes, the same shortcut keys will also insert date fields into the text of an Outlook email.
Tip o’ the Week 329 – Fuzzy Duck? Yes, he does
Sometimes, when writing the ToW, the topic is inspired by a specific problem that someone has emailed me – it’d be a lot better if they’d email me the solution to a problem, but never mind – and sometimes it comes about because of an issue I’ve spent ages struggling with and then happened upon a solution. Today’s is following one of those latter episodes.
Be honest. Do you know how to use the VLOOKUP function in Excel? It has its roots in @LOOKUP from VisiCalc, which goes back well over 30 years – see here for a demo (and, wipe a tear, you missed “VLOOKUP WEEK 2012”).
It’s one of the more useful functions, where you can use tables of text to cross reference one another – leading some to create spreadsheets to manipulate data that might be achieved elsewhere by a database join or an IF…THEN…ELSE statement.
VLOOKUP (and her friends, HLOOKUP, LOOKUP and the other reference functions) is all very well if you have nicely constructed and controlled data – but what if you have messy text that has been entered by end users? How do you go about normalising that without boring brute force (ie ploughing through it all yourself)?
Imagine, if you will, that you have a list of a few hundred company names exported from your CRM system – let’s call them “Partners”. What if you also had many thousands of unique names from people who’ve registered at a conference? (Let’s call that “Partner Conference”). Wouldn’t it be nice to run a report which shows the team that works with each partner, who has registered and where they’re from?
If the registration tool allowed anyone to enter free text fields for the name of their company, you’ll get any number of variations, mis-spellings etc – maybe even the odd deliberate spanner. (On the McXFace front, once again, El Reg excelled itself with this headline, though has a way to go to top the best so far… or the subheading of this one, which reads like a line from a DC Thomson cartoon).
These names won’t allow VLOOKUPs as they’ll show up as all different, and therefore cross-referencing one source with the other will be difficult. So even telling Jane Smith, who manages the ACME Inc account, that these 10 people are attending the conference, is going to be hard if every one of them registered with a variation of A.C.M.E, ACME Inc, Ac-me Ltd and so on.
A relatively little-known Excel addin might come to the rescue (technically described as a technology preview in the EULA, but it’s been around for a little while in its last variation, and a few more before that; so probably is not going to advance a great deal more) – the Fuzzy Lookup Add-in for Excel. Simply take two sources of data (formatted as tables), create one or more mappings between them, and run the tool to see what it comes up with.
The Fuzzy Lookup tool will add extra columns to the source table; showing the text that it thinks is the nearest match, and a score of “similarity”. The technology comes from Microsoft Research, and uses the Jaccard Similarity method of comparing sample data sets.
One technique for comparing a couple of different columns is to set conditional formatting on the Similarity column and choose colour scales for easy identification of the ones likely to be correct; or simply put a filter on that column and hide rows below an arbitrary low bar (like 0.6). Then spin down the two columns to the left and check to see if they tally up, given the human eye for spotting similarity, spelling mistakes etc. You could even add a Y/N column to the right so you can manually affirm which is right and which is not, then filter on that to confirm.
After installing the Fuzzy Lookup addin, you’ll get a fairly detailed Readme and a nicely illustrative Excel sample file showing some share price comparisons (with company names in wildly different formats being matched with eerie accuracy). It might be in preview but it could be exactly what the Excel jockey needs.
Fuzzy Duck? Ducky Fuzz! Does he?
Tip o’ the Week 323 – Some lesser-known Excel spreadsheetery
Spreadsheets did – or do, still – make the modern IT world go round. Until Dan Bricklin and VisiCalc invented the familiar grid-based software environment, business school boffins had to manually write up large grids of numbers on blackboards, and routinely calculate the impact of changes in any one “cell” based on a book of formulae. An error-prone process that could, as you’d imagine, take a long time.
As it happens, VisiCalc powered the Jobs’n’Woz enterprise to mass success, as Apple IIs were selling (even fully kitted out at $5k+ a time, in the 1980s) to middle managers who were sick of the Data Processing department taking ages to turn around financial reports, so they took to sticking an Apple on their desk and doing the sums themselves. If you’re interested in all of this, see here.
This threat from Valley spooked Big Blue enough to worry about the mainframe franchise being under threat, and after a couple of false starts and a skunkworks project called Chess, the PC was born.
Lotus software quickly became the de facto spreadsheet provider running on PC-DOS (as it was so fast, meaning the spreadsheet jockeys ditched their Apple IIs and flocked to PCs), but Lotus got distracted with OS/2 while Microsoft’s Windows 3.0 started to gain traction. In place of WordPerfect, Lotus 1-2-3 and dBase, the 1st party Microsoft Office suite took early and full advantage of Windows, ultimately powering Office to the front. So, Excel trumped 1-2-3, just as Lotus eclipsed VisiCalc.
That means Microsoft Excel for Windows has been around for a very long time, and there are many functions you’ve probably never used – but there are loads of useful tips that could make your life easier. Here are a few…