Ponto is a tool intended for scoring the spellings of young learners.
A typical way to use the tool is to first prepare one’s data in a spreadsheet program, copy and paste it into the Ponto tool, select the type of analysis one wants, and click the Score! button. A server on the Web will do the scoring then return a new table like the one you submitted, but with additional columns scoring each of the spellings. It also returns a table that summarizes the statistics for each speller. These tables can be selected and pasted into a spreadsheet.
Ponto expects the table of data to be arranged by trials. That is, each row except for the column headers contains a stimulus (the word presented to the subject) and a response (the subject’s attempted spelling). It typically will also include a cell identifying the subject. Thus this is a typical table format:
The three column headers must be included exactly as show here, always in lowercase. The columns can be in any order. You may include other columns in your data; they will be ignored by the program, but echoed back to you in the results. These rules provide for some flexibility, but also a little trap for the unwary: if you label the subject column something other than subject, perhaps sujet, everything will look fine except that the summary table will not break things down by subject: It will just have a single row labelled ALL.
Except for the requirement that the first row must be column headers, the subsequent rows can come in any order. Thus one may choose to arrange the above information like this:
and still get a summary by subject.
You may omit a response, in which case it will be treated as a response with no letters – probably getting a big error score. However, omitting the stimulus cell in a row simply means there was no trial; Ponto will just return --- as the score, and will not include a big error score in the summary statistics for the subject. You can also achieve the same effect by typing --- (3 dashes) as the stimulus or the response.
Perhaps counterintuitively, it is possible to have two or more stimuli for a given trial. Multiple stimuli are separated by the vertical bar character |. For example, a possible entry might be:
One way to think of multiple stimuli is that the entry in the stimulus column is an indirect representation of the actual stimulus, which the participant may interpret in different ways. For example, the experimenter’s pronunciation is a sequence of phones, such as [t̠ʰɹi], but our unit of analysis is phonemes. Some speakers may interpret this as /tri/, others as /tʃri/. Another reason to use multiple stimuli is because different experimenters may pronounce a word slightly differently, or the participants may reinterpret a stimulus in terms of their own pronunciation.
All the examples so far have shown the stimulus as a pronunciation transcription. In other analyses, it may make sense to have the stimulus be a target spelling. For example:
Of course it is unlikely that the actual stimulus in a spelling experiment will be a display of the correct answer. But it may make sense to represent the stimulus as the orthographic spelling in Ponto if your model of analysis is that the participant is reaching for a particular spelling, and you wish to score her response in terms of how different it is from that target.
Your table of trial data is entered into the text box labelled Trials, in any of several ways. Note that a possible header field is already entered for you, but in most cases you will want to delete the contents of the box before you enter your own data (e.g., click in the box, Ctrl-a, Delete).
subject,stimulus,response 1,haʊs,hous 1,fɑrm,farm 2,haʊs,house 2,fɑrm,farm 3,haʊs,howse 3,fɑrm,frmInstead of a comma you can use a tab or a space as a separator, but those have the disadvantage that it is very hard to tell when you have two or more of them in a sequence. Recall that a cell can be empty, so sometimes separators will be adjacent to each other. Tabs have the additional disadvantage that most browsers are set up so that hitting the Tab key is a special shortcut for taking you out of the Trials box and into the Correspondences box. Whichever separator you use, you must use the same one throughout the whole table.
Spaces are ignored at the beginning or end of a cell.
If you wish, you may include quotation marks around your items, e.g., 'hous' or "hous". This is provided mostly as a convenience for those who save CSV files from spreadsheets that add quotation marks automatically. But you may also like the idea of representing empty content, such as null responses, as "" instead of just leaving invisible space. Ponto does not do anything with quotation marks but ignore them. In particular, if you use a comma or space as a separator character, Ponto will not treat them as data just because they appear inside quotes.
The Ponto tool operates with Unicode UTF-8 encoding. If you are running a good, standard, modern computer, everything should work fine and you won’t need to wonder what UTF-8 means. If the characters change to some other character when you copy and paste between programs, then it is likely that one of your programs does not understand Unicode, or that you are using fonts in some nonstandard way. In particular, some old schemes for typing the International Phonetic Alphabet involved using special fonts that might, for example, display the shape ʃ when they see the code for ordinary S. If you use such a scheme, pasting your text into the Ponto tool will result in your seeing the letter for what it really is, an S.
Ponto uses a simple model of spelling: it is context-free. For example, it can be told that /s/ is spelt <s>, <c>, or <ss>, but it cannot be told to accept <ss> only after vowel letters.
As discussed a little later below, there are several predefined correspondence sets you can use. But if you wish to define your own, or to make modifications to existing schemes, you can enter the rules into the Correspondences text box. Some examples:
This first three rows of this incomplete table illustrates how to enter the rule just discussed. Each row gives one way of spelling /s/. The heading stimulus actually means ‘unit of correspondence in the entries in the stimulus column of the trials’, and response refers to the units of correspondence in the response column of the trials. Thus if your stimuli are pronunciations, the units in the stimulus column here will be sounds; if your stimuli are represented in terms of target spellings, the stimulus column here will give letters.
The penalty cell tells how big an error it would be to use the spelling in the response column to spell the unit in the stimulus column. The intention is that a correct spelling should receive the penalty 0, and that errors would get positive numbers. The table above illustrates that <s>, <c>, or <ss> are all considered perfectly fine spellings for /s/, but that the spelling <cc>, though close and credible, should be counted slightly incorrect. But the only firm requirement is that the entries be numbers: all digits, with possibly one decimal point. Ponto will score subjects’ spellings by adding up the penalty scores for all the correspondence rules needed to align the stimulus with the spelling. It is perfectly normal to specify only correct correspondences, and thus to give all of the correspondence rules the penalty 0, and let Ponto assign default penalties (see below) for alignments that are not specified explicitly.
It is possible to leave out the response unit from a row, in which case the correspondence tells how big a penalty to assign if the subject does not spell the stimulus unit. Most users will prefer to just use deletion box below the Correspondences box to indicate a uniform penalty to assign to deletions, but here one may enter specific penalties for specific deletions. The table above gives the example of assigning a penalty to not spelling a schwa; one might reasonably want to count its omission as less important than the omission of other sounds. Or, if the stimuli are expressed as target spellings, one might imagine wanting to give a smaller penalty to leaving out nonalphabetic characters such as hyphens.
Conversely, it is possible to omit a stimulus unit and give only a response unit. That tells the penalty for inserting letters into the spelling which do not spell anything in the input. A uniform penalty for insertions is given in the insertion box, but these entries override that box. The table above gives an example of assigning only a tiny penalty for putting an illegal hyphen in a word.
Unlike the Trials table, there is no role for the | separator: You cannot, for example, specify multiple stimuli units or multiple response units in the same row. Just enter multiple rows, even if some of them have the same stimulus or the same response as another row.
The rules for formatting the Correspondences table are much the same as for Trials. Columns can be separated by comma, tab, or space.
Sometimes it is best to use somebody else’s set of correspondences, not only to save time but also to ensure some standardization across studies. You can, of course, achieve this by exchanging files containing the correspondences and copying and pasting their contents. Ponto also has several correspondences built in, which you can use simply by selecting a pulldown option. These are listed underneath the Correspondences text box. Hover your mouse over their names to get a very brief description as a tool tip; click on their names to select them. Currently available sets are:
Some issues to keep in mind when using correspondence sets or defining your own:
Ponto will use all correspondences in all the named sets you check as well as those you include in the Correspondences box. If a row in the Correspondences box has the same stimulus unit and the same response unit as a row in one of the named sets, the penalty specified in the Correspondences box prevails. This is one way of making small alterations to a named set.
The checkboxes under Score correspondences not found in above tables specify how Ponto will score correspondences not explicitly specified elsewhere. They start up with some useful defaults. Insertion tells how much of a penalty to give when the child inserts a spelling unit that doesn’t correspond to any unit of the stimulus; the default is 1. Deletion is the opposite: the penalty for not spelling a unit of the stimulus; its default is also 1. Substitution is the penalty for spelling a stimulus unit with a response unit other than those specified in the correspondence sets. Its default is 1.4, because of Pythagoras.
The checkbox Require correct sequence switches between two rather different ways of scoring.
When it is not checked, Ponto performs a very lenient scoring. It looks at each unit in the stimulus, then tries to find in the response the best representation of that unit; that is, the correspondence with the lowest penalty. It does not care about order, or even about reusing a spelling that was matched with a prior stimulus unit. Any extraneous letters are ignored, i.e., given a penalty of 0, regardless of the setting of the Deletion checkbox. For example, if the correspondences include /æ/=<a>, /k/=<c> and /t/=<t>, then “attacks” would get an error score of 0. This is clearly a very lenient scoring method. Some correspondence sets are based on the assumption that this box will be unchecked.
When the box is checked, Ponto attempts a strict sequential match. It tries every possible alignment of stimulus units with response units, then selects the alignment that gives the best score (lowest cumulative penalty). It does not allow alignment lines to cross: spellings out of sequential order are considered to be insertions. Thus “attacks” for /kæt/ would likely get credit for the initial <a> and one of the two following <t>s, but not for the following <c> or <k>, because those come out of sequence. Thus the overall error score would be fairly high, with the “acks” tail all being considered insertions. Of course, the precise details depend on the correspond set. This sort of alignment is essentially a Levenshtein metric, except that it is not symmetrical: switching the stimulus and the response will usually give a different score.
The checkbox Run Monte Carlo significance test asks for additional information to be returned with the results, telling how much better each subject did than would be expected by chance. Chance levels are interpreted as the score a speller would get if they were just generating spellings without any regard for the the rules. This is determined by randomly rearranging the pairing between the stimuli and the response 1,000 times, each time measuring the total error rate. If the child was actually using at least some of the rules some of the time, and paying attention to what word he was supposed to be spelling, his actual total score (distSum, for sum of distances) should be less than the average total scores across those 1,000 random rearrangements (distRand). This number will be reported as improve, the proportional improvement over chance levels. Additionally, Ponto will compute a p value: the proportion of the rearrangements whose score was better than the observed distSum. This may be interpreted as the probability that the null hypothesis is correct: namely, that any improvement over chance was in fact coincidental. A low p value means that the child is probably paying attention (perhaps implicitly) to the correspondences and the stimuli. Naturally, it takes much longer for Ponto to return results if the Monte Carlo option is checked, so that box should be left unchecked when this information is not needed.
The button labelled Score! sends the information to the server, performs the analysis, and returns the results.
Results are returned as a new page that replaces your original query page, although you should be able to use the browser’s Go Back command to get your original query page back. The results page begins with a Parameters section that basically repeats the options you specified on the search page; the idea is that if you save this page as a file, you will have enough context to understand the search options that can affect the results. In the Correspondences subsection, all correspondences are listed, including the ones you had brought in from named correspondence sets. Though it can be tedious to scroll past this information, occasionally it can be vitally important.
The meat of the results page is the next two sections. The Scores section basically repeats the data you submitted in the Trials section, but adds two new columns at the end of the table: distance and align. The distance is the error: how far away the child’s spelling is from correct. This is the sum of all penalties occurred when attempting to align the stimulus with the response: explicit correspondence penalties as well as default penalties for alignments not explicitally licensed. There are often multiple ways to align the stimulus with the response; Ponto always picks the best alignment, the one that gives the child the most credit, that is, the lowest distance score. The align column shows the alignment that gives that lowest score; if more than one alignment has that same score, one of those alignments is selected arbitrarily. An example of the format is [c=p*][a=a][=s*][t=t][=a*], which is a plausible alignment when one spells “pasta” for cat. Each correspondence is enclosed in square brackets, with the stimulus unit preceding the response unit, separated by an = sign. For an insertion, no stimulus unit appears; for a deletion, no response unit appears. Whenever an alignment has a penalty greater than 0, it is followed by an asterisk. The alignments give a wealth of information that can be examined to do detailed error analyses. But even when such detailed analysis is not required, it is a good idea to examine a representative sample of alignments, in order to fully understand how the scoring process works under a given correspondence set.
The Summaries section aggregates the information. If the Trials data contains a subject column, then the aggregation is by subject, giving one row of information for each subject; if not, the section just contains a single row for the entire run. In all cases, the Summaries table contains a distSum column, which sums all the distance scores for the subject; nTrials, which tells how many trials the subject had; and distMean, which divides distance by nTrials to give the subject’s average error score. If Monte Carlo tests were asked for, there are 3 additional columns: distRand, improve, and pRand, as described earlier: The distRand column is the average score after random rearrangement; improve is the proportional improvement of distSum over distRand; and pRand is the p value of the hypothesis that improve is due to chance.
The Results tables are returned as preformatted HTML text, not as HTML tables. This means that they may not be lined up very nicely, but the browser will display them fast and will not be put off by tables that contain hundreds or thousands of rows. If you wish, you can select a table and paste it into your favourite software, such as a spreadsheet. If your first attempt does not give favourable results, you may find that there is a command such as paste special that lets you get fine-grained control over how the data is entered. The browser presents the tables as tab-separated tables of Unicode text.
Please keep in mind that the Web is not inherently secure: eavesdroppers can potentially intercept and read your Trials data. Subject-identifying information should be omitted. If this is not possible, Ponto should be run only behind a firewall on a local network, or even on one’s own computer; or it should be run under HTTPS.
Last change 2009-11-21T00:49:05-0600